45 research outputs found

    Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window

    Get PDF
    The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which contains a variable number of items and possibly out-of-order items). The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to be able to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is O(kϵlogϵNk)O(\frac{k} {\epsilon} \log \frac{\epsilon N}{k}) bits for basic counting and O(kϵlogNk)O(\frac{k}{\epsilon} \log \frac{N}{k}) words for the remainings, where kk is the number of distributed data streams, NN is the total number of items in the streams that arrive or expire in the window, and ϵ<1\epsilon < 1 is the desired error bound. Matching and nearly matching lower bounds are also obtained.Comment: 12 pages, to appear in the 27th International Symposium on Theoretical Aspects of Computer Science (STACS), 201

    SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

    Get PDF
    To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

    A Gramaticalização do Verbo Ir e a Variação de Formas para Expressar o Futuro do Presente: uma Fotografia Capixaba

    Get PDF
    Esta pesquisa verifica o estágio do processo de gramaticalização do verbo IR, que tem assumido a função de auxiliar em construções perifrásticas para expressar tempo. Para isso, investiga-se a variação entre as formas sintética e perifrástica com IR para expressão do futuro do presente. Temos por hipótese que a forma perifrástica já atinge todos os gêneros das duas modalidades da língua, uma vez que já se especializou para codificar tempo. São examinados dois gêneros, tomando-os como prototípicos do continuun oral/escrito: entrevistas com informantes universitários e editoriais de jornal. Partindo de uma orientação teórica Funcionalista, num quadro mais geral, concebe-se a língua como flexível ao uso, passível de influências cognitivas, sociais e também individuais, embora haja nela forças que atuam no sentido de regularizar a estrutura. Seguindo algumas pesquisas que têm se mostrado frutíferas, o modelo funcionalista estará em diálogo com outro modelo que procura dar conta da heterogeneidade estruturada da língua e de seus processos de mudança: a Teoria Variacionista. Num quadro mais específico, os fundamentos que orientam a pesquisa são os da Gramaticalização. Os dados extraídos dos gêneros selecionados serão submetidos ao programa computacional GOLDVARB 2001 e, em seguida, interpretados à luz das teorias lingüísticas que fundamentam esta pesquisa

    SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

    Get PDF
    To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A

    Pain Controlling and Cytokine-regulating Effects of Lyprinol, a Lipid Extract of Perna Canaliculus, in a Rat Adjuvant-induced Arthritis Model

    Get PDF
    Using an adjuvant-induced arthritis rat model, we investigated the effects of a lipid extract of Perna canaliculus (Lyprinol®) on pain. Radiological examinations, as well as levels of pro- and anti-inflammatory (AI) cytokines, were measured aiming to provide independent objective data to the pain controlling investigation. We confirmed the ability of Lyprinol® to control pain at the initial phase of its administration; with similar efficacy to that observed with Naproxen. The pain scores slowly increased again in the group of rats treated with Lyprinol® after day 9–14. The Naproxen-treated rats remained pain-free while treated. Both Naproxen and Lyprinol® decreased the levels of the pro-inflammatory cytokines TNF-α and IFN-γ, and increased that of IL-10. Extra-virgin olive oil was ineffective on cytokine secretion. Rats treated with Lyprinol® were apparently cured after 1 year. This study confirms the AI efficacy of this lipid extract of P. canaliculus, its initial analgesic effect, its perfect tolerance and its long-term healing properties

    Flow Dominance and Factorization of Transverse Momentum Correlations in Pb-Pb Collisions at the LHC

    Get PDF
    We present the first measurement of the two-particle transverse momentum differential correlation function, P2≡ ΔpTΔpT/ pT2, in Pb-Pb collisions at sNN=2.76 TeV. Results for P2 are reported as a function of the relative pseudorapidity (Δη) and azimuthal angle (Δφ) between two particles for different collision centralities. The Δφ dependence is found to be largely independent of Δη for |Δη|≥0.9. In the 5% most central Pb-Pb collisions, the two-particle transverse momentum correlation function exhibits a clear double-hump structure around Δφ=π (i.e., on the away side), which is not observed in number correlations in the same centrality range, and thus provides an indication of the dominance of triangular flow in this collision centrality. Fourier decompositions of P2, studied as a function of the collision centrality, show that correlations at |Δη|≥0.9 can be well reproduced by a flow ansatz based on the notion that measured transverse momentum correlations are strictly determined by the collective motion of the system

    New results on online job scheduling and data stream algorithms

    No full text
    published_or_final_versionComputer ScienceDoctoralDoctor of Philosoph

    An Automatic Question Generator for Chinese Comprehension

    No full text
    Question generation (QG) is a natural language processing (NLP) problem that aims to generate natural questions from a given sentence or paragraph. QG has many applications, especially in education. For example, QG can complement teachers&rsquo; efforts in creating assessment materials by automatically generating many related questions. QG can also be used to generate frequently asked question (FAQ) sets for business. Question answering (QA) can benefit from QG, where the training dataset of QA can be enriched using QG to improve the learning and performance of QA algorithms. However, most of the existing works and tools in QG are designed for English text. This paper presents the design of a web-based question generator for Chinese comprehension. The generator provides a user-friendly web interface for users to generate a set of wh-questions (i.e., what, who, when, where, why, and how) based on a Chinese text conditioned on a corresponding set of answer phrases. The web interface allows users to easily refine the answer phrases that are automatically generated by the web generator. The underlying question generation is based on the transformer approach, which was trained on a dataset combined from three publicly available Chinese reading comprehension datasets, namely, DRUD, CMRC2017, and CMRC2018. Linguistic features such as parts of speech (POS) and named-entity recognition (NER) are extracted from the text, which together with the original text and the answer phrases, are then fed into a machine learning algorithm based on a pre-trained mT5 model. The generated questions with answers are displayed in a user-friendly format, supplemented with the source sentences in the text used for generating each question. We expect the design of this web tool to provide insight into how Chinese question generation can be made easily accessible to users with low computer literacy

    An Automatic Question Generator for Chinese Comprehension

    No full text
    Question generation (QG) is a natural language processing (NLP) problem that aims to generate natural questions from a given sentence or paragraph. QG has many applications, especially in education. For example, QG can complement teachers’ efforts in creating assessment materials by automatically generating many related questions. QG can also be used to generate frequently asked question (FAQ) sets for business. Question answering (QA) can benefit from QG, where the training dataset of QA can be enriched using QG to improve the learning and performance of QA algorithms. However, most of the existing works and tools in QG are designed for English text. This paper presents the design of a web-based question generator for Chinese comprehension. The generator provides a user-friendly web interface for users to generate a set of wh-questions (i.e., what, who, when, where, why, and how) based on a Chinese text conditioned on a corresponding set of answer phrases. The web interface allows users to easily refine the answer phrases that are automatically generated by the web generator. The underlying question generation is based on the transformer approach, which was trained on a dataset combined from three publicly available Chinese reading comprehension datasets, namely, DRUD, CMRC2017, and CMRC2018. Linguistic features such as parts of speech (POS) and named-entity recognition (NER) are extracted from the text, which together with the original text and the answer phrases, are then fed into a machine learning algorithm based on a pre-trained mT5 model. The generated questions with answers are displayed in a user-friendly format, supplemented with the source sentences in the text used for generating each question. We expect the design of this web tool to provide insight into how Chinese question generation can be made easily accessible to users with low computer literacy

    Scheduling for weighted flow time and energy with rejection penalty

    Get PDF
    This paper revisits the online problem of flow-time scheduling on a single processor when jobs can be rejected at some penalty [4]. The user cost of a job is defined as the weighted flow time of the job plus the penalty if it is rejected before completion. For jobs with arbitrary weights and arbitrary penalties, Bansal et al. [4] gave an online algorithm that is O((log W + log C) 2)-competitive for minimizing the total user cost when using a slightly faster processor, where W and C are the max-min ratios of job weights and job penalties, respectively. In this paper we improve this result with a new algorithm that can achieve a constant competitive ratio independent of W and C when using a slightly faster processor. Note that the above results assume a processor running at a fixed speed. This paper shows more interesting results on extending the above study to the dynamic speed scaling model, where the processor can vary the speed dynamically and the rate of energy consumption is a cubic or any increasing function of speed. A scheduling algorithm has to control job admission and determine the order and speed of job execution. This paper studies the tradeoff between the above-mentioned user cost and energy, and it shows two O(1)-competitive algorithms and a lower bound result on minimizing the user cost plus energy. These algorithms can also be regarded as a generalization of the recent work on minimizing flow time plus energy when all jobs must be completed (see the survey paper [1])
    corecore